Towards Effective Log Summarization
نویسندگان
چکیده
Database access logs are the canonical go-to resource for tasks ranging from performance tuning to security auditing. Unfortunately, they are also large, unwieldy, and it can be difficult for a human analyst to divine the intent behind typical queries in the log. With an eye towards creating tools for ad-hoc exploration of queries by intent, we analyze techniques for clustering queries by intent. Although numerous techniques have already been developed for log summarization, they target specific goals like query recommendation or storage layout optimization rather than the more fuzzy notion of query intent. In this paper, we first survey a variety of log summarization techniques, focusing on a class of approaches that use query similarity metrics. We then propose DCABench, a benchmark that evaluates how well query similarity metrics capture query intent, and use it to evaluate three similarity metrics. DCABench uses student answers to query construction assignments to capture a wide range of distinct SQL queries that all have the same intent. Next, we propose and evaluate a query regularization process that standardizes query representations, significantly improving the effectiveness of the three similarity metrics tested. Finally, we propose an entirely new similarity metric based on the Weisfeiler-Lehman (WL) approximate graph isomorphism algorithm, which identifies salient features of a graph — or in our case, of the abstract syntax tree of a query. We show experimentally that distances in WL-feature space capture a meaningful notion of similarity, while still retaining competitive performance.
منابع مشابه
Summarization: Some Problems and Methods
The provision of summaries is of crucial importance for fully effective retrieval of information, but research on summarization has been relatively neglected, After an outline of the basic linguistic and cognitive complexities of text understanding and summarizing, the paper reviews some current projects towards automating various aspects of summarization, and discusses future prospects.
متن کاملTowards a logical framework for OLAP query log manipulation
This paper proposes a manipulation language tailored for OLAP query logs, stemming from the relational algebra. This language is based on binary relations over sequences of queries (called sessions). We propose two such relations allowing to group and order sessions. Examples of expressions in this language illustrate its interest for various user-centric approaches, like query recommendation o...
متن کاملThe Pareto Principle Is Everywhere: Finding Informative Sentences for Opinion Summarization Through Leader Detection
Most previous works on opinion summarization focus on summarizing sentiment polarity distribution towards different aspects of an entity (e.g., battery life and screen of a mobile phone). However, users’ demand may be more beyond this kind of opinion summarization. Besides such coarse-grained summarization on aspects, one may prefer to read detailed but concise text of the opinion data for more...
متن کاملMeasuring Importance and Query Relevance in Topic-focused Multi-document Summarization
The increasing complexity of summarization systems makes it difficult to analyze exactly which modules make a difference in performance. We carried out a principled comparison between the two most commonly used schemes for assigning importance to words in the context of query focused multi-document summarization: raw frequency (word probability) and log-likelihood ratio. We demonstrate that the...
متن کاملGenerate Compressed Sentences with Stanford Typed Dependencies towards Abstractive Summarization
In this paper, we implement sentence generation process towards generate abstractive summarization which is proposed by (Genest and Lapalme, 2010). We simply use Stanford Typed Dependencies1 to extract information items and generate multiple compressed sentences via Natural Language Generation engine. Then we follow LexRank based sentence ranking combined with greedy sentence selection to build...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016